The PENMAN Project on Knowledge-Based Machine Translation

نویسنده

  • Eduard H. Hovy
چکیده

of an integrated knowledge-based machine-aided translation system called PANQLOSS. The ISI-specific work includes the development of English sentence generation and sentence planning capabilities and the construction of an Ontology of concepts to act as the semantic lexicon for all modules of the system as a whole. In addition, we continue to enhance Penman's existing generation technology, to collect and develop ancillary knowledge sources and software (such as grammars or bilingual dictionaries and lexicons for German, Japanese, Spanish, and Chinese), and to maintain and distribute Penman. RECENT RESULTS During the past year, the generation component of PAN-CLOSS was installed; PANGLOSS was tested during the first DARPA MT evaluation. This work necessitated the development of code to transfer the output of New Mex-ico's ULTRA parser to a form suitable for Penman. More recently, Penman Project members have been working on the semi-automated construction and acquisition of an Ontology for PANGLOSS. A high-level tax-onomy of the basic concepts required for the processing of ULTRA, the CMU software, and Penman was synthesized out of several sources; this 400-odd node taxonomy we call the Ontology Base (OB). Current work involves migrating wordsense names from LDOCE into WordNet using several automatic techniques and then taxonomiz-ing fragments of WordNet under the OB; at the present time, approx. 11,000 concepts have been so taxonomized and another 10,000 are awaiting final placement. Our goal is an Ontology organized under the OB of approx. 50,000 items. Toward this goal we acquired WordNet from Princeton and an online copy of Roget's thesaurus. Ramping up toward making the Ontology support processing of other languages, we have been collecting multilingual resources of various types. We have acquired an online Japanese-English dictionary (approx. 50,000 entries with phrases), several Chinese-English online dictionaries (approximately equal total size), and are in the process of acquiring the Collins bilingual Spanish-English dictionary. We have also established X-windows based display capabilities for Japanese and Chinese, in-eluding a Japanese emacs editor and dictionary access intertgce. In other work, the core mapping engine of the Sentence Planning module of PANGLOSS has been constructed and is currently being debugged. The Sentence Planner converts representations of texts written in the Pangloss In-terlingua into SPL expressions suitable for Penman. PLANS FOR THE COMING YEAR Three principal efforts are planned for the coming year: the construction of the 50,000-node Ontology, the development of English, Japanese, and Spanish lexicons associated with the Ontology, and the development and …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Penman Natural Language Project Systemics-Based Machine Translation

The development of an integrated knowledgebased machine-aided translation system based on Systemic Linguistics. Parts of the system are to function as modules to be incorporated in the MAT system being codeveloped with CMU and CRL. Our work involves the enhancement of Penman's existing parsing technology to match the level of the language generation system; the development of ancillary knowledg...

متن کامل

In-Depth Knowledge-Based Machine Translation

The development of ap integrated knowledge-based machine-aided translation system called PANGLOSS in collaboration with the Center for Machine 'Ikanslation (CMT) at CMU and the Computing Research Laboratory (CRL) at New Mexico State University. The IS1 part of the collaboration is focused initially on providing the system's output capabilities, primarily in English and then in other languages, ...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Team-Based Integrated Knowledge Translation for Enhancing Quality of Life in Long-term Care Settings: A Multi-method, Multi-sectoral Research Design

Multi-sectoral, interdisciplinary health research is increasingly recognizing integrated knowledge translation (iKT) as essential. It is characterized by diverse research partnerships, and iterative knowledge engagement, translation processes and democratized knowledge production. This paper reviews the methodological complexity and decision-making of a large iKT projec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993